9 research outputs found
Estimating conditional density of missing values using deep Gaussian mixture model
We consider the problem of estimating the conditional probability
distribution of missing values given the observed ones. We propose an approach,
which combines the flexibility of deep neural networks with the simplicity of
Gaussian mixture models (GMMs). Given an incomplete data point, our neural
network returns the parameters of Gaussian distribution (in the form of Factor
Analyzers model) representing the corresponding conditional density. We
experimentally verify that our model provides better log-likelihood than
conditional GMM trained in a typical way. Moreover, imputation obtained by
replacing missing values using the mean vector of our model looks visually
plausible.Comment: A preliminary version of this paper appeared as an extended abstract
at the ICML 2020 Workshop on The Art of Learning with Missing Value
The general framework for few-shot learning by kernel HyperNetworks
Few-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting, where only one element represents each class. We propose the general framework for few-shot learning via kernel HyperNetworks—the fusion of kernels and hypernetwork paradigm. Firstly, we introduce the classical realization of this framework, dubbed HyperShot. Compared to reference approaches that apply a gradient-based adjustment of the parameters, our models aim to switch the classification module parameters depending on the task’s embedding. In practice, we utilize a hypernetwork, which takes the aggregated information from support data and returns the classifier’s parameters handcrafted for the considered problem. Moreover, we introduce the kernel-based representation of the support examples delivered to hypernetwork to create the parameters of the classification module. Consequently, we rely on relations between the support examples’ embeddings instead of the backbone models’ direct feature values. Thanks to this approach, our model can adapt to highly different tasks. While such a method obtains very good results, it is limited by typical problems such as poorly quantified uncertainty due to limited data size. We further show that incorporating Bayesian neural networks into our general framework, an approach we call BayesHyperShot, solves this issue
Hypernetwork approach to Bayesian MAML
The main goal of Few-Shot learning algorithms is to enable learning from
small amounts of data. One of the most popular and elegant Few-Shot learning
approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this
method is to learn the shared universal weights of a meta-model, which are then
adapted for specific tasks. However, the method suffers from over-fitting and
poorly quantifies uncertainty due to limited data size. Bayesian approaches
could, in principle, alleviate these shortcomings by learning weight
distributions in place of point-wise weights. Unfortunately, previous
modifications of MAML are limited due to the simplicity of Gaussian posteriors,
MAML-like gradient-based weight updates, or by the same structure enforced for
universal and adapted weights.
In this paper, we propose a novel framework for Bayesian MAML called
BayesianHMAML, which employs Hypernetworks for weight updates. It learns the
universal weights point-wise, but a probabilistic structure is added when
adapted for specific tasks. In such a framework, we can use simple Gaussian
distributions or more complicated posteriors induced by Continuous Normalizing
Flows.Comment: arXiv admin note: text overlap with arXiv:2205.1574
Augmentation-aware Self-supervised Learning with Guided Projector
Self-supervised learning (SSL) is a powerful technique for learning robust
representations from unlabeled data. By learning to remain invariant to applied
data augmentations, methods such as SimCLR and MoCo are able to reach quality
on par with supervised approaches. However, this invariance may be harmful to
solving some downstream tasks which depend on traits affected by augmentations
used during pretraining, such as color. In this paper, we propose to foster
sensitivity to such characteristics in the representation space by modifying
the projector network, a common component of self-supervised architectures.
Specifically, we supplement the projector with information about augmentations
applied to images. In order for the projector to take advantage of this
auxiliary guidance when solving the SSL task, the feature extractor learns to
preserve the augmentation information in its representations. Our approach,
coined Conditional Augmentation-aware Selfsupervised Learning (CASSLE), is
directly applicable to typical joint-embedding SSL methods regardless of their
objective functions. Moreover, it does not require major changes in the network
architecture or prior knowledge of downstream tasks. In addition to an analysis
of sensitivity towards different data augmentations, we conduct a series of
experiments, which show that CASSLE improves over various SSL methods, reaching
state-of-the-art performance in multiple downstream tasks.Comment: Prepint under review. Code: https://github.com/gmum/CASSL
Processing of incomplete data with convolutional neural networks
Trenowanie modeli uczenia maszynowego na danych zawierających luki to jeden z wiodących problemów w tej dziedzinie nauki. Problem niepełnych danych pojawia się w wielu przypadkach praktycznego zastosowania takich modeli, takich jak robotyka czy przetwarzanie obrazów medycznych. Przystosowanie głębokich sieci neuronowych, takich jak sieci konwolucyjne, do przetwarzania takich danych pozostaje otwartym problemem, gdyż techniki polegające na naiwnym, pojedynczym wypełnieniu często mogą działać niedokładnie, zarazem nie przekazując informacji o niepewności podanego wypełnienia. W niniejszej pracy prezentujemy MisConv - metodę przetwarzania obrazów z brakującymi fragmentami przez głębokie konwolucyjne sieci neuronowe. Zaproponowana metoda jest ogólnieniem konwolucyjnych warstw sieci neuronowej zdolnym do przetwarzania parametrów gęstości prawdopodobieństwa brakujących fragmentów danych i obliczającym wartość oczekiwaną aktywacji sieci. Jednocześnie dla znanych fragmentów obrazów, warstwa ta działa identycznie jak zwykła warstwa konwolucyjna. Do działania MisConv konieczne jest przewidywanie rozkładów brakujących fragmentów obrazów. W tym celu zostaje użyty model Deep Mixture of Factor Analyzers (DMFA), który do estymacji parametrów tych rozkładów wykorzystuje głęboką sieć neuronową. Porównujemy to podejście z innymi popularnymi modelami służącymi do tego zadania i pokazujemy, że model ten może być również z sukcesem trenowany na niepełnych danych. Sprawdzamy zaproponowaną metodę, trenując na niepełnych danych modele służące do różnych zadań przetwarzania obrazów – klasyfikacji, rekonstrukcji oraz generatywne. Jakość tych modeli zostaje porównana z innymi sposobami trenowania na niepełnych danych – używając alternatywnych do zaproponowanego sposobów imputacji i różnych sposobów przetwarzania rozkładu brakujących fragmentów przez model wykonujący docelowe zadanie. Przeprowadzone eksperymenty pokazują, że sieci konwolucyjne wyposażone w warstwę MisConv osiągają wyniki lepsze lub porównywalne z innymi metodami. Fragment niniejszych prac dotyczący modelu DMFA został opublikowany w artykule "Estimating Conditional Density of Missing Values Using Deep Gaussian Mixture Model" na workshopie Artemiss: The Art of Learning with Missing Values podczas międzynarodowej konferencji Intenational Conference on Machine Learning (ICML) w lipcu 2020 r. oraz na międzynarodowej konferencji International Conference on Neural Information Processing (ICONIP) w listopadzie 2020 r. Artykuł opisujący warstwę MisConv pt. "MisConv: Convolutional Neural Networks for Missing Data" został zgłoszony na międzynarodową konferencję Neural Infomation Processing Systems (NeurIPS) 2021 i znajduje się obecnie w procesie recenzji.Training of Machine Learning models on incomplete data is one of the most important problems in this research domain. The issue of missing data arises in many practical applications of such models, such as robotics or processing of medical images. Adapting deep neural networks, such as convolutional neural networks, to the case of missing data remains an open problem, because imputation-based techniques may often yield inaccurate results and are not able to estimate the uncertainty of their own predictions. In this work, we present MisConv — a method for processing of images with missing data by deep convolutional neural networks. The proposed method is a generalization of the classical convolutional layer, able to process the probability density of the missing data and compute the expected value of network activation. Nevertheless, for the known parts of images, this layer acts like a classical convolution.An essential component of MisConv is the estimation of distributions of the missing parts of images. For this purpose, we utilize a model called Deep Mixture of Factor Analyzers (DMFA), which utilizes a neural network to perform this task. We compare this approach with other popular models used for missing data imputation and show that DMFA can be successfully trained on missing data. We evaluate the proposed method by training on missing data models used for various image processing tasks — classification, reconstruction and generation. Those target models are compared with models trained with different means of handling the missing data. The conducted experiments indicate, that convolutional neural networks equipped with MisConv layer obtain better or similiar results, compared to other methods of processing missing data.A fragment of this work describing the DMFA model has been published in a paper "Estimating Conditional Density of Missing Values Using Deep Gaussian Mixture Model" in the " Artemiss: The Art of Learning with Missing Values" workshop during the Intenational Conference on Machine Learning (ICML) 2020 Conference in July 2020, as well as a paper of the same name during the International Conference on Neural Information Processing (ICONIP) in November 2020. The paper describing the MisConv Layer has been submitted to the Neural Infomation Processing Systems (NeurIPS) 2021 Conference and is currently under review
MisConv: convolutional neural networks for missing data
Processing of missing data by modern neural networks, such as CNNs, remains a
fundamental, yet unsolved challenge, which naturally arises in many practical
applications, like image inpainting or autonomous vehicles and robots. While
imputation-based techniques are still one of the most popular solutions, they
frequently introduce unreliable information to the data and do not take into
account the uncertainty of estimation, which may be destructive for a machine
learning model. In this paper, we present MisConv, a general mechanism, for
adapting various CNN architectures to process incomplete images. By modeling
the distribution of missing values by the Mixture of Factor Analyzers, we cover
the spectrum of possible replacements and find an analytical formula for the
expected value of convolution operator applied to the incomplete image. The
whole framework is realized by matrix operations, which makes MisConv extremely
efficient in practice. Experiments performed on various image processing tasks
demonstrate that MisConv achieves superior or comparable performance to the
state-of-the-art methods.Comment: Accepted for publication at WACV 2022 Conferenc
HyperShot : few-shot learning by kernel hypernetworks
Few-shot models aim at making predictions using a minimal number of labeled
examples from a given task. The main challenge in this area is the one-shot
setting where only one element represents each class. We propose HyperShot -
the fusion of kernels and hypernetwork paradigm. Compared to reference
approaches that apply a gradient-based adjustment of the parameters, our model
aims to switch the classification module parameters depending on the task's
embedding. In practice, we utilize a hypernetwork, which takes the aggregated
information from support data and returns the classifier's parameters
handcrafted for the considered problem. Moreover, we introduce the kernel-based
representation of the support examples delivered to hypernetwork to create the
parameters of the classification module. Consequently, we rely on relations
between embeddings of the support examples instead of direct feature values
provided by the backbone models. Thanks to this approach, our model can adapt
to highly different tasks
Zero time waste in pre-trained early exit neural networks
The problem of reducing processing time of large deep learning models is a fundamental challenge in many real-world applications. Early exit methods strive towards this goal by attaching additional Internal Classifiers (s) to intermediate layers of a neural network. s can quickly return predictions for easy examples and, as a result, reduce the average inference time of the whole model. However, if a particular does not decide to return an answer early, its predictions are discarded, with its computations effectively being wasted. To solve this issue, we introduce Zero Time Waste (ZTW), a novel approach in which each reuses predictions returned by its predecessors by (1) adding direct connections between s and (2) combining previous outputs in an ensemble-like manner. We conduct extensive experiments across various multiple modes, datasets, and architectures to demonstrate that ZTW achieves a significantly better accuracy vs. inference time trade-off than other early exit methods. On the ImageNet dataset, it obtains superior results over the best baseline method in 11 out of 16 cases, reaching up to 5 percentage points of improvement on low computational budgets